Breath-Detection-Based Telephony Speech Phrasing

نویسندگان

Takashi Fukuda

Osamu Ichikawa

Masafumi Nishimura

چکیده

ASR has long attracted attention for call center monitoring systems. In the ASR technology for call center conversations, the system usually divides an input signal into separate utterances and eliminates the unneeded silence parts of the signal before doing ASR processing on the detected utterances. This means the input signal should be split into utterances of the proper length for both ASR performance and readability. However, typical VAD techniques sometimes generate overly long speech segments because they are focused only on the length of the pause (non-speech) between sentences. In contrast, it is shown that speakers typically take breaths for when speaking more than one sentence or long sentences. These breaths are highly correlated with the major prosodic breaks. In this paper, we focus on the breath events in the pause intervals and attempt to split the input signal into utterances by detecting the breathing events. The proposed method leverages acoustic information that is specialized for breathing sounds, which led to a two-step approach to detect the breath events with an accuracy of 97.4%. Also, the proper speech phrasing based on breath events improved word error rate in ASR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Style-Specific Phrasing in Speech Synthesis

People pause between words and sentences when they speak. They pause to emphasize content, or to make an utterance more understandable, or just to take a breath. A speech synthesizer should also insert similar pauses to sound natural. The process of inserting prosodic breaks in an utterance is called Phrasing. Phrasing is a crucial step during speech synthesis because other models of prosody de...

متن کامل

Intonation patterns in older children with cerebral palsy before and after speech intervention.

PURPOSE This paper examined the production of intonation patterns in children with developmental dysarthria associated with cerebral palsy (CP) prior to and after speech intervention focussing on respiration and phonation. The study further sought to establish whether intonation performance might be related to changes in speech intelligibility. METHOD Intonation patterns were examined using c...

متن کامل

Speech recognition with automatic punctuation

We present a method of speech recognition with automatic punctuation based on a combination of acoustic and lexical evidence. In the recognizer vocabulary, punctuation marks are treated as word entries. By assigning the acoustic baseforms of silence, breath, and other non-speech sounds to punctuation marks, and using a properly processed N-gram language model, unpronounced punctuation marks of ...

متن کامل

Several Aspects of Machine-Driven Phrasing in Text-to-Speech Systems

The article discusses differences between a priori and a posteriori phrasing and their importance in the task of automatic prosodic phrasing in text-to-speech systems. On several examples it illustrates shortcomings of common evaluation of a priori phrasing performance using a posteriori phrasing of referential corpus data. The paper also proposes and evaluates a method for a priori phrasing ba...

متن کامل

Acoustic Cues for Automatic Determination of Phrasing

This paper proposes a framework of automatic determination of phrasing using acoustic features derived from the speech signal. The feature vectors were defined in a series of analyses investigating the acoustic-phonetic realization of minor and major phrase boundaries and different boundary types. The resulting representation was used to train statistical classifiers to automatically determine ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Breath-Detection-Based Telephony Speech Phrasing

نویسندگان

چکیده

منابع مشابه

Style-Specific Phrasing in Speech Synthesis

Intonation patterns in older children with cerebral palsy before and after speech intervention.

Speech recognition with automatic punctuation

Several Aspects of Machine-Driven Phrasing in Text-to-Speech Systems

Acoustic Cues for Automatic Determination of Phrasing

عنوان ژورنال:

اشتراک گذاری